16 research outputs found
Surface Networks
We study data-driven representations for three-dimensional triangle meshes,
which are one of the prevalent objects used to represent 3D geometry. Recent
works have developed models that exploit the intrinsic geometry of manifolds
and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants,
which learn from the local metric tensor via the Laplacian operator. Despite
offering excellent sample complexity and built-in invariances, intrinsic
geometry alone is invariant to isometric deformations, making it unsuitable for
many applications. To overcome this limitation, we propose several upgrades to
GNNs to leverage extrinsic differential geometry properties of
three-dimensional surfaces, increasing its modeling power.
In particular, we propose to exploit the Dirac operator, whose spectrum
detects principal curvature directions --- this is in stark contrast with the
classical Laplace operator, which directly measures mean curvature. We coin the
resulting models \emph{Surface Networks (SN)}. We prove that these models
define shape representations that are stable to deformation and to
discretization, and we demonstrate the efficiency and versatility of SNs on two
challenging tasks: temporal prediction of mesh deformations under non-linear
dynamics and generative models using a variational autoencoder framework with
encoders/decoders given by SNs
Efficient Online Reinforcement Learning with Offline Data
Sample efficiency and exploration remain major challenges in online
reinforcement learning (RL). A powerful approach that can be applied to address
these issues is the inclusion of offline data, such as prior trajectories from
a human expert or a sub-optimal exploration policy. Previous methods have
relied on extensive modifications and additional complexity to ensure the
effective use of this data. Instead, we ask: can we simply apply existing
off-policy methods to leverage offline data when learning online? In this work,
we demonstrate that the answer is yes; however, a set of minimal but important
changes to existing off-policy RL algorithms are required to achieve reliable
performance. We extensively ablate these design choices, demonstrating the key
factors that most affect performance, and arrive at a set of recommendations
that practitioners can readily apply, whether their data comprise a small
number of expert demonstrations or large volumes of sub-optimal trajectories.
We see that correct application of these simple recommendations can provide a
improvement over existing approaches across a diverse set
of competitive benchmarks, with no additional computational overhead
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing
We present a system that enables an autonomous small-scale RC car to drive
aggressively from visual observations using reinforcement learning (RL). Our
system, FastRLAP (faster lap), trains autonomously in the real world, without
human interventions, and without requiring any simulation or expert
demonstrations. Our system integrates a number of important components to make
this possible: we initialize the representations for the RL policy and value
function from a large prior dataset of other robots navigating in other
environments (at low speed), which provides a navigation-relevant
representation. From here, a sample-efficient online RL method uses a single
low-speed user-provided demonstration to determine the desired driving course,
extracts a set of navigational checkpoints, and autonomously practices driving
through these checkpoints, resetting automatically on collision or failure.
Perhaps surprisingly, we find that with appropriate initialization and choice
of algorithm, our system can learn to drive over a variety of racing courses
with less than 20 minutes of online training. The resulting policies exhibit
emergent aggressive driving skills, such as timing braking and acceleration
around turns and avoiding areas which impede the robot's motion, approaching
the performance of a human driver using a similar first-person interface over
the course of training
Training Diffusion Models with Reinforcement Learning
Diffusion models are a class of flexible generative models trained with an
approximation to the log-likelihood objective. However, most use cases of
diffusion models are not concerned with likelihoods, but instead with
downstream objectives such as human-perceived image quality or drug
effectiveness. In this paper, we investigate reinforcement learning methods for
directly optimizing diffusion models for such objectives. We describe how
posing denoising as a multi-step decision-making problem enables a class of
policy gradient algorithms, which we refer to as denoising diffusion policy
optimization (DDPO), that are more effective than alternative reward-weighted
likelihood approaches. Empirically, DDPO is able to adapt text-to-image
diffusion models to objectives that are difficult to express via prompting,
such as image compressibility, and those derived from human feedback, such as
aesthetic quality. Finally, we show that DDPO can improve prompt-image
alignment using feedback from a vision-language model without the need for
additional data collection or human annotation.Comment: 20 pages, 12 figure